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Abstract 

Given a linear system in a real or complex domain, linear regression 
aims to recover the model parameters from a set of observations. Re- 
cent studios in compressive sc^nsing have successfully shown that under 
certain conditions, a linear program, namely, £i-minimization, guar- 
antees recovery of sparse parameter signals even when the system is 
underdctermined. In this paper, we consider a more challenging prob- 
lem: when the phase of the output measurements from a linear system 
is omitted. Using a lifting technique, we show that even though the 
phase information is missing, the sparse signal can be recovered ex- 
actly by solving a simple semidefinite program when the sampling rate 
is sufficiently high, albeit the exact solutions to both sparse signal re- 
covery and phase retrieval are combinatorial. The results extend the 
type of applications that compressive sensing can be applied to those 
where only output magnitudes can be observed. We demonstrate the 
accuracy of the algorithms through theoretical analysis, extensive sim- 
ulations and a practical experiment. 

1 Introduction 

Linear models, e.g. y = Ax, are by far the most used and useful type of 
model. The main reasons for this are their simplicity of use and identifi- 
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cation. For the identification, the least-squares (LS) estimate in a complex 
domain is computed bj|^ 
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assuming the output y G and A £ £,Nxn given. Further, the LS 
problem has a unique solution if the system is full rank and not underdeter- 
mined, i.e. N > n. 

Consider the alternative scenario when the system is underdetermined, 
i.e. n > N. The least squares solution is no longer unique in this case, and 
additional knowledge has to be used to determine a unique model parameter. 
Ridge regression or Tikhonov regression iHoerl and Kennard 1970 is one 



of the traditional methods to apply in this case, which takes the form 

. 1„ 



argmm - 

X ^ 



y 



Ax\\l + A||a;"^ 



2' 



where A > is a scalar parameter that decides the trade off between fit in 
the first term and the ^2-iiorm of x in the second term. 

Thanks to the £2-norm regularization, ridge regression is known to pick 
up solutions with small energy that satisfy the linear model. In a more 
recent approach stemming from the LASS O [Tibsharani 



pressive sensing (CS) [Candes et al. 2006, Donoho 



2006 



1996 and com- 



another convex 

regularization criterion has been widely used to seek the sparsest parameter 
vector, which takes the form 
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Depending on the choice of the weight parameter A, the program ([s]) has been 
known as the LASSO by Tibsharani 1996| , basis pursuit denoising (BPDN) 
by Chen et al.|[1998j , or £i-minimization (^i-min) by Candes et al.||2006] . In 
recent years, several pioneering works have contributed to efficiently solving 
sparsity minimization problems such as Tropp 2004, Beck and TebouUe 



2009 , Bruckstein et al. , 2009 , especially when the system parameters and 
observations are in high-dimensional spaces. 

In this paper, we consider a more challenging problem. In a linear model 
y = Ax, rather than assuming that y is given, we will assume that only the 
squared magnitude of the output is observed: 



\yi\^ = \{x,ai)\^, i 



1, 



(4) 



^Our derivation in this paper is primarily focused on complex signals, but the results 
should be easily extended to real domain signals. 
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where A" = [ai,-- - ,a^] G C"^^, = [yi,--- ,yjv] G C^^^ and 
denotes the Hermitian transpose of A. This is clearly a more challenging 
problem since the phase of y is lost when only its (squared) magnitude is 
available. A classical example is that y represents the Fourier transform of a;, 
and that only the Fourier transform modulus is observable. This scenario 
arises naturally in several practical applications such as optics [Walt her" 



1963, Millane, 1990 , coherent diffraction imaging fFienup 1987 , and as- 



tronomical imaging Dainty and Fienup 1987j and is known as the phase 
retrieval problem. 

We note that in general phase cannot be uniquely recovered regardless 
whether the linear model is overdetermined or not. A simple example to see 
this, is if xq G C" is a solution to y = Ax, then for any scalar c E C on the 



unit circle cxq leads to the same squared output h. As mentioned in Candes 



et al. , 2011a , when the dictionary A represents the unitary discrete Fourier 



transform (DFT), the ambiguities may represent time-reversed solutions or 
time-shifted solutions of the ground truth signal xq. These global ambigu- 
ities caused by losing the phase information are considered acceptable in 
phase retrieval applications. From now on, when we talk about the solution 
to the phase retrieval problem, it is the solution up to a global phase am- 
biguity. Accordingly, a unique solution is a solution unique up to a global 
phase. 

Further note that since Q is nonlinear in the unknown x, N ^ n 
measurements are in general needed for a unique solution. When the number 
of measurements A'^ are fewer than necessary for a unique solution, additional 
assumptions are needed to select one of the solutions (just like in Tikhonov, 
Lasso and CS). 

Finally, we note that the exact solution to either CS and phase retrieval is 
combinatorially expensive [Chen et al. 1998, Candes et al. , 2011c . There- 
fore, the goal of this work is to answer the following question: Can we 
effectively recover a sparse parameter vector x of a linear system up to a 
global ambiguity using its squared magnitude output measurements via con- 
vex programming? The problem is referred as compressive phase retrieval 
(CPR) [Moravec et al.[[2M7|. 



The main contribution of the paper is a convex formulation of the sparse 
phase retrieval problem. Using a lifting technique, the NP-hard problem is 
relaxed as a semidefinite program. We also derive bounds for guaranteed 
recovery of the true signal and compare the performance of our CPR algo- 
rithm with traditional CS and PhaseLift Candes et al. , 2011a algorithms 
through extensive experiments. The results extend the type of applications 
that compressive sensing can be applied to; namely, applications where only 
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magnitudes can be observed. 



1.1 Background 



Our work is motivated by the £i-min problem in CS and a recent PhaseLift 
technique in phase retrieval by Candes et al. 2011c| . On one hand, the 
theory of CS and £i-min has been one of the most visible research topics in 
recent years. There are several comprehensive review papers that cover the 
literature of CS and related optimization techniques in linear programming. 



The reader is referred to the works of Candes and Wakin 2008 , Bruckstein 



et al., 2009, Loris, 2009, Yang et al. 2010 . On the other hand, the fusion 



of phase retrieval and matrix completion is a novel topic that has recently 
been studied in a selected few papers, such as [Candes et al. 2011c|a . The 



fusion of phase retrieval and CS was discussed in Moravec et al. , 2007 . In 



the rest of the section, we briefly review the phase retrieval literature and 
its recent connections with CS and matrix completion. 

Phase retrieval has been a longstanding problem in optics and x-ray 



crystallography since the 1970s Kohler and Mandel, 1973, Gonsalves 19761. 



Early methods to recover the phase signal using Fourier transform mostly 
relied on additional information about the signal, such as band limitation, 
nonzero support, real-valuedness, and nonnegativity. The Gerchberg-Saxton 
algorithm was one of the popular algorithms that alternates between the 
Fourier and inverse Fourier transforms to obtain the phase estimate iter- 
atively Gerchberg and Saxton, 1972, Fienup, 1982 . One can also utilize 



steepest-descent methods to minimize the squared estimation error in the 
Fourier domain Fienup 1982, Marchesini 2007 . Common drawbacks of 



these iterative methods are that they may not converge to the global solu- 
tion, and the rate of convergence is often slow. Alternatively, [Balan et al. 
(2006 have studied a frame-theoretical approach to phase retrieval, which 



necessarily relied on some special types of measurements. 

More recently, phase retrieval has been framed as a low-rank matrix 
completion problem in Chai et al. 2010 , Candes et al. , 2011a|c . Given 
a system, a lifting technique was used to approximate the linear model 
constraint as a semidefinite program (SDP), which is similar to the objective 
function of the proposed method only without the sparsity constraint. The 
authors also derived the upper-bound for the sampling rate that guarantees 
exact recovery in the noise- free case and stable recovery in the noisy case. 



We are aware of the work by Moravec et al. 12007 , which has considered 



compressive phase retrieval on a random Fourier transform model. Lever- 
aging the sparsity constraint, the authors proved that an upper-bound of 
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0(A;^ log(4n/A;^)) random Fourier modulus measurements to uniquely spec- 
ify /c-sparse signals. Moravec et al. 2007 also proposed a greedy compressive 
phase retrieval algorithm. Their solution largely follows the development of 
£i-min in CS, and it alternates between the domain of solutions that give 
rise to the same squared output and the domain of an ^i-ball with a fixed 
^i-norm. However, the main limitation of the algorithm is that it tries to 
solve a nonconvex optimization problem and that it assumes the £i-norm of 
the true signal is known. No guarantees for when the algorithm recovers the 
true signal can therefore be given. 



2 CPR via SDP 

In the noise free case, the phase retrieval problem takes the form of the 
feasibility problem: 

find X subj. to b = = {a,^a;a;^aj}i<j<Ar, (5) 

where fo^ = [6i, • • • , 6iv] G M^^^. This is a combinatorial problem to solve: 
Even in the real domain with the sign of the measurements {aj}^^ C 
{— 1, 1}, one would have to try out combinations of sign sequences until 
one that satisfies 

ai\/bi = o-fx, i = 1, ■ ■ ■ , N, (6) 

for some x € M" has been found. For any practical size of data sets, this 
combinatorial problem is intractable. 

Since ^ is nonlinear in the unknown x, N ^ n measurements are in 
general needed for a unique solution. When the number of measurements N 
are fewer than necessary for a unique solution, additional assumptions are 
needed to select one of the solutions. Motivated by compressive sensing, we 
here choose to seek the sparsest solution of CPR satisfying ([s]) or, equivalent, 
the solution to 

min||a;||o, subj. to b = \Ax\'^ = {af xx^ ai\i<i<fq . (7) 

As the counting norm || • ||o is not a convex function, following the £i-norm 
relaxation in CS, ([T]) can be relaxed as 

minlla^lli, subj. to b = \Ax\'^ = {af xx^ ai\i<i<N ■ (8) 

X 

Note that ([s]) is still not a linear program, as its equality constraint is not 
a linear equation. In the literature, a lifting technique has been extensively 
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used to reframe problems such as ([s]) to a standard form in semidefinite 



programming, such as in Sparse PCA d'Aspremont et al. , 2007 



More specificaUy, given the ground truth signal xq G C", let Xq 
xqXq G (j^nxn g^j^ induced rank-1 semidefinite matrix. Then the co: 
pressive phase retrieval (CPR) problem can be cast a^ 



minx 11^ 111 
subj. to bi = Tr{af^ Xtti), i = 
rank(X) = 1,X ^ 0. 



(9) 



This is of course still a non-convex problem due to the rank constraint. 
The lifting approach addresses this issue by replacing rank(X) with Tr(X). 
For a semidefinite matrix, Tr(X) is equal to the sum of the eigenvalues of 
X (or the ^i-norm on a vector containing all eigenvalues of X). This leads 
to an SDP 



minx Tr(X) + A||X||i 
subj. to bi = Tr($iX), i 
XhO, 



I,-.. ,N, 



(10) 



where we further denote = aia^ G (^nxn where A > is a design 
parameter. Finally, the estimate of x can be found by computing the rank-1 
decomposition of X via singular value decomposition. We will refere to the 



formulation (10) as compressive phase retrieval via lifting (CPRL). 



We compare (10) to a recent solution of PhaseLift by Chai et al. 2010 , 



Candes et al. 2011c . In Chai et al. [2010 , Candes et al. 2011c , a similar 
objective function was employed for phase retrieval: 



minx Tr(X) 
subj. to bi = Tr(^>iX), i 
XhO, 



I,-.. ,A^, 



(11) 



albeit the source signal was not assumed sparse. Using the lifting technique 
to construct the SDP relaxation of the NP-hard phase retrieval problem. 



with high probability, the program (11) recovers the exact solution (sparse or 



dense) if the number of measurements A^ is at least of the order of 0{n log n). 
The region of success is visualized in Figure [T] as region I. 

If X is sufficiently sparse and random Fourier dictionaries are used for 



sampling, Moravec et al. 12007 showed that in general the signal is uniquely 



^In this paper, for a matrix X denotes the entry-wise ^i-norm, and ||X||2 denotes 

the Frobenius norm. 



6 



defined if the number of squared magnitude output measurements b exceeds 
tlie order of 0(/c^ log(4n/A;^)). Tliis lower bound for tlie region of success of 
CPR is illustrated by the dash line in Figure [T] 

Finally, the motivation for introducing the £i-norm regularization in (10) 



is to be able to solve the sparse phase retrieval problem for N smaller than 
what PhaseLift requires. However, one will not be able to solve the compres- 
sive phase retrieval problem in region III below the dashed curve. Therefore, 
our target problems lie in region II. 




Figure 1: An illustration of the regions of importance in solving the phase 
retrieval problem. While PhaseLift primarily targets problems in region I, 
CPRL operates primarily in regions II and III. 



Example 1 (Compressive Phase Retrieval). In this example, we illustrate 
a simple CPR example, where a 2-sparse complex signal xq G C^^ is first 
transformed by the Fourier transform F E ([;;;64x64 jQUgyj^d random pro- 



7 



jections R £ C 



32x64 



(generated by sampling a unit complex Gaussian): 



\RFxo\ 



(12) 



Given b, F , and R, we first apply PhaseLift algorithm Gandes et al. 



201 Ic^ with A = RF to the 32 squared observations h. The recovered dense 



signal is shown in Figure]^ As seen in the figure, PhaseLift fails to identify 
the 2-sparse signal. 



Next, we apply CPRL (15), and the recovered sparse signal is also shown 
in Figure GPRL correctly identifies the two nonzero elements in x . 
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Figure 2: The magnitude of the estimated signal provided by CPRL and 
PhaseLift (PL). CPRL correctly identifies elements 2 and 24 to be nonzero 
while PhaseLift provides a dense estimate. It is also verified that the esti- 
mate from CPRL, after a global phase shift, is approximately equal the true 

Xq. 



3 Stable Numerical Solutions for Noisy Data 

In this section, we consider the case that the measurements are contaminated 
by data noise. In a linear model, typically bounded random noise affects 
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the output of the system as y = Ax + e, where e G is a noise term with 
bounded ^2-iiorm: ||e||2 < e. However, in phase retrieval, we fohow closely 
a more special noise model used in Candes et al. [2011c : 



\{x,ai)\'^ + ei 



(13) 



This nonstandard model avoids the need to calculate the squared magnitude 
output |yp with the added noise term. More importantly, in practical phase 
retrieval applications, measurement noise is introduced when the squared 
magnitudes or intensities of the linear system are measured, not on y itself 
dCandes et al]|2011c| ). 

Accordingly, we denote a linear operator B of X as 



B : X £ 



^ {Tr($,X)}i<,<^ e 



(14) 



which measures the noise- free squared output. Then the approximate CPR 
problem with bounded I2 error model (13) can be solved by the following 
SDP program: 

min Tr{X) + X\\X\\i 
subj. to \\B{X) - 6II2 < e, (15) 
X hO. 

The estimate of x, just as in noise free case, can finally be found by com- 
puting the rank-1 decomposition of X via singular value decomposition. We 
refer to the method as approximate CPRL. Due to the machine rounding 



error, in general a nonzero e should be always assumed in the objective (15) 
and its termination condition during the optimization. 

We should further discuss several numerical issues in the implementa- 



tion of the SDP program. The constrained CPRL formulation (15) can be 
rewritten as an unconstrained objective function: 



minTr(X) + AIIXI 



12) 



(16) 



where A > and fi > are two penalty parameters. 



In (16), due to the lifting process, the rank-1 condition of X is approx- 



imated by its trace function Tr{X). In Candes et al. [2011c , the authors 
considered phase retrieval of generic (dense) signal x. They proved that 
if the number of measurements obeys > cnlogn for a sufficiently large 



constant c, with high probability, minimizing ( 16 ) without the sparsity con- 
straint (i.e. A = 0) recovers a unique rank-1 solution obeying X* = xx^ . 



See also Recht et al. 2010 
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In Section [7j we will show that using either random Fourier dictionaries 
or more general random projections, in practice, one needs much fewer mea- 
surements to exactly recover sparse signals if the measurements are noise 
free. Nevertheless, in the presence of noise, the recovered lifted matrix X 
may not be exactly rank-1. In this case, one can simply use its rank-1 
approximation corresponding to the largest singular value of X. 



We also note that in (16), there are two main parameters A and /i that 
can be defined by the user. Typically ^ is chosen depending on the level of 
noise that affects the measurements h. For A associated with the sparsity 
penalty ||-^||i, one can adopt a warm start strategy to determine its value 
iteratively. The strategy has been widely used in other sparse optimization. 



such as in £i-min [Yang et al. 2010 . More specifically, the objective is 



solved iteratively with respect to a sequence of monotonically decreasing 
A, and each iteration is initialized using the optimization results from the 
previous iteration. The procedure continues until a rank 1 solution has 
been found. When A is large, the sparsity constraint outweighs the trace 
constraint and the estimation error constraint, and vice versa. 

Example 2 (Noisy Compressive Phase Retrieval) . Let us revisit Example [7] 
hut now assume that the measurements are contaminated by noise. Using 
exactly the same data as in Example^but adding uniformly distributed mea- 
surement noise between —1 and 1, CPRL was able to recover the 2 nonzero 
elements. PhaseLift, just as in Example^ gave a dense estimate of x. 

4 Computational Aspects 

In this section we discuss computational issus of the proposed SDP formula- 
tion, algorithms for solving the SDP and to efficient approximative solution 
algorithms. 

4.1 A Greedy Algorithm 



Since (10) is an SDP, it can be solved by standard software, such as CVX 



[Grant and Boyd 2010 . However, it is well known that the standard tool- 



boxes suffer when the dimension of X is large. We therefore propose a greedy 



approximate algorithm tailored to solve (10). If the number of nonzero el- 



ements in x is expected to be low, the following algorithm may be suitable 



10 



and less computationally heavy compare to approaching the original SDP: 



Algorithm 1: Greedy Compressive Phase Retrieval via Lifting 
(GCPRL) 
Set T = and let 7 > 0, e > 0. 
repeat 

for /t = I, - - ,iV, do 

Set Xfc = X|J{fc} and solve Xk = 

argminx^oTV(X(^fc)) + lY.tlib^ - Tr(af ^^af 
|_ Let Wk denote the corresponding objective value. 
Let p be such that Wp <Wk,k = I,- ■ ■ ,N. Set X = I[j{p} and 
X = Xp. 
until Wp < e; 



Example 3 (GCPRL ability to solve the CPR problem). To demonstrate 
the effectiveness of GCPRL let us consider a numerical example. Let the 
true xq G C" be a k -sparse signal, let the nonzero elements be randomly 
chosen and their values randomly distributed on the complex unit circle. 
Let A G ([;;^xn g^jig^^^i^d 5^ sampling from a complex unit Gaussian 
distribution. 

If we fix n/N = 2, that is, twice as many unknowns as measurements, 
and apply GCPRL for different values of n, N and k we obtain the com- 
putational times visualized in the left plot of Figure In all simulations 
7 = 10 and e = 10~^ are used in GCPRL. The true sparsity pattern was 
always recovered. Since GCPRL can be executed in parallel, the simulation 
times can be divided by the number of cores used (the average run time in 
Figure is computed on a standard laptop running Matlab, 2 cores, and 
using CVX to solve the low dimensional SDP of GCPRL). The algorithm is 
several magnitudes faster than the standard interior-point methods used in 
CVX. 



5 The Dual 

CPRL takes the form: 

minx Tr(X) + A||X||i 
subj. to bi = Ti:{^iX);i = l,--- ,N, (17) 
XhO. 
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Figure 3: Average run time of GCPRL in Matlab CVX environment. 



If we define h{X) to be an A^-dimensional vector such that our constraints 
are h{X) = 0, then we can equivalently write: 

minx ™ax^,y,z=z^ 

Tr(X) + Tr(ZX) + ^^h{X) - Ti{YX) 
subj. to y ^ 0, (18) 

||-^||oo ^ A. 

Then the dual becomes: 

max^z=zH l^^h 

subj. to ||Z||oo < A. (19) 
y :=/ + Z-Eti/^i^i^O. 

6 Analysis 

This section contains various analysis results. The analysis follows that of 



CS and have been inspired by derivations given in Candes et al. 


2011c 


2006 


Donoho 


2006 


Candes 


2008 


Berinde et al. 


2008 , Bruckstein et al. 


20091 



The analysis is divided into four subsections. The three first subsections 
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give results based on RIP, RIP-1 and mutual coherency respectively. The 
last subsection focuses on the use of Fourier dictionaries. 



6.1 Analysis Using RIP 

In order to state some theoretical properties, we need a generalization of the 
restricted isometry property (RIP). 

Definition 4 (RIP). We will say that a linear operator B[-) is {e,k)-RIP 
if for all X ^ s.t. \\X\\o < k we have 



BiX)g ^ 



\x\\l 



2 

' < e. (20) 



We can now state the following theorem: 

Theorem 5 (Recoverability /Uniqueness). Let B{-) be a {e,2\\X*\\Q)-RIP 
linear operator with e < 1 and let x be the sparsest solution to (|4]). If X* 
satisfies 

b =B{X*), 
X* >zO, (21) 
vank{X*} =1, 

then X* is unique and X* = xx^ . 

Proof of Theorem^ Assume the contrary i.e., X* ^ xx^ . It is clear that 
ll^^'^llo < ll-^*l|o and hence \\xx^ — X*\\o < 2||X*||o. If we now apply the 
RIP inequality ([20]) on xx^ - X* and use that B{xx^ - X*) = we are 
led to the contradiction 1 < e. We therefore conclude that X* is unique and 
X* = xx". □ 

We can also give a bound on the sparsity of x: 



Theorem 6 (Bound on ||iES^||o from above). Let x be the sparsest solution 
to Q and let X be the solution of CPRL (10). If X has rank 1 then 

\\X\\q > \\xX^\\q. 



Proof of Theorem^ Let X be a rank-1 solution of CPRL (10). By contra- 
diction, assume \\X\\o < \\xx^\\q. Since X satisfies the constraints of (|4]), it 
must give a lower objective value than xx^ in ^ . This is a contradiction 
since xx^ was assumed to be the solution of ([4j). Hence we must have that 
||X||o > II^S^IIo- □ 
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Corollary 7 (Guaranteed Recovery Using RIP). Let x be the sparsest so- 
lution to Q. The solution of CPRL (10), X, is equal to xx^ if it has rank 
1 and B is (e,2\\X\\Q)-RIP with e < 1. 

Proof of Corollary^ This follows trivially from Theorem [5] by realizing that 
X satisfy all properties of X* . 

□ 

If xx^ = X can not be guaranteed, the following bound could come 
useful: 

Theorem 8 (Bound on ||X* — XII2). Let e < ^ r- and assume B{-) to 
be a {e,2k)-RLP linear operator. Let X* be any matrix (sparse or dense) 
satisfying 

h =B{X*), 
X* ^0, (22) 
rank{X*} =1, 



let X be the CPRL solution, ( |10[ ), and form Xs from X* by setting all but 
the k largest elements to zero, i.e.. 



Xs= argmin \\X* - X\\i. (23) 

X:||X||o<fc 



Then, 



||^-^*||2 <- —=\\X*-Xs\\l 

+(2(1 - p)-^ + k~^/^)\{TrX* - TtX) (24) 
A 

with p = V2e/{1 - e). 

Proof of Theorem fsl The proof is inspired by the work on compressed sens- 



ing presented in Candes 2008 



First, we introduce A = X — X* . For a matrix X and an index set 
T, we use the notation Xt to mean the matrix with all zeros except those 
indexed by T, which are set to the corresponding values of X. Then let 
To be the index set of the k largest elements of X* in absolute value, and 
Tq = {(1, 1), (1, 2), ... , (n, n)} \ Tq be its complement. Let Ti be the index 
set associated with the k largest elements in absolute value of A^c and 
To^i = Tq U Ti be the union. Let T2 be the index set associated with the k 
largest elements in absolute value of Ayc^, and so on. 
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Notice that 

l|A||2 = IIAto,, +At,=_J|2 < ||ATo,J|2 + ||ATeJ|2. (25) 

We will now study each of the two terms on the right hand side separately. 

We first consider ||Atc^||2. For j > 1 we have that for each i E Tj and 
i' G Tj_i that |A[i]| < |A[i']|. Hence ||AtJ|oo < || At,_ J|i/A;. Therefore, 

||Ar,||2 < A:1/'||AtJoo < At,_J|i (26) 

and 

IIAt^c JI2 < ||At,||2 < k-'/'\\ATsh. (27) 
i>2 

Now, since X minimizes Tr X + A||X||i, we have 

TrX* + A||X*||i > TrX + A||X||i 

> TrX + A(||X^.J|i-||Aro||i (28) 
+ ||At„H|i-||^^^,||i). 



Hence, 



|Atc||i < ^TY a - \\X*tJi + \\AtJi + llX^elli + ||X*||i. (29) 



Using the fact ||X^c||i = ||X* - Xs\\i = \\X*\\i - ||X^J|i, we get a bound 
for ||Atc||i: 

||ATe||i< ^TrA + ||Aro||i + 2||X^e||i. (30) 
Subsequently, the bound for ||A7-cJ|2 is given by 

||AT„^J|2<fc-i/2(^TrA + ||ATol|i + 2||X^,.||i) (31) 
<A;-V2(^XrA + 2||X^.||i) + ||AT„||2. (32) 
Next, we consider || Aj-q ^ II2. It can be shown by a similar derivation as 

in 



Candes 2008 that 



IIAT0JI2 < ^k-y'\\X* - Xsh - Xr^^Tr A. (33) 
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Lastly, combine the bounds for ||Aj'cJ|2 and ||Atqi||2) and we get the 
final result: 

||A||2 <||Ato,i||2 + ||At,cJ|2 (34) 
< - k-^^^\ Tr A + 2/c~^/2||X^c||i + 2||Aro,i||2 (35) 
<- (2{1- p)-^ + k-^/^^jTrA (36) 
+ 2{1- p)-^k-'^/^\\X* - XsWi. (37) 

□ 

The bound given in Theorem [8] is rather impractical since it contains 
both \\X — X* II 2 and Tr(X — X*). The weaker bound given in the following 
corollary does not have this problem: 

Corollary 9 (A Practical Bound on ||X-X*||2). The hound on ||X-X*||2 
in Theorem\^ can be relaxed to a weaker bound: 

-(fr-p + ^)\l^-X-h (38) 

If X* is k-sparse, e < j:^^! and B{-) is an {e,2k)-RIP linear operator, then 
we can guarantee that X = X* if 

A > ^ + 1 (40) 
1- p 

and X has rank 1. 

Proof of Corollary^ It follows from the assumptions of Theorem [8] that 

l-p = l-^>l- =0. (41) 

Hence, 

(2(1-p)-VA;-^/2) i >0. (42) 
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Therefore, we have 



\X -X* 



< 



\X* - X, 



{l-p)Vk' 

+ (2 (1 - p)-^ + A:-i/2^ i(TrX* - Trl) 
2 

< ^11^* -^slli 



X* - x\ 



< 



{l-p)Vk 



\X* - X, 



+ f2(l-p)-i + A:-i/2 



A 



X* -X\ 



(43) 

(44) 
(45) 

(46) 
(47) 

(48) 



which is equal to the proposed condition after a rearrangement of the terms. 

□ 

Given the above analysis, however, it may be the case that the linear 
operator B[-) does not satisfy the RIP property defined in Definition |4| as 
pointed out in Candes et al. 2011c . Therefore, next we turn our attention 



to RIP-1 linear operators. 

6.2 Analysis Using RIP-1 

We define RIP-1 as follows: 

Definition 10 (RIP-1). A linear operator B{-) is {e, k)-RIP-l if for all 
matrices X 7^ subject to \\X\\q < k, we have 



\B{X)\U 



X 



1 



< e. 



(49) 



Theorems [5]-[6] and Corollary [7] all hold with RIP replaced by RIP-1. 
The proofs follow those of the previous section with minor modifications 
(basically replace the 2-norm with the £i-norm). The RIP-1 counterparts of 
Theorems [5]-[6] and Corollary [7] are not restated in details here. Instead we 
summarize the most important property in the following theorem: 

Theorem 11 (Upper Bound & Recoverability Through £1). Let x he the 

sparsest solution to Q. The solution of CPRL (10), X, is equal to xx^ if 
it has rank 1 and B{-) is (e,2\\X\\Q)-RIP-l with e < 1. 
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Proof of Theorem 1 1 , The proof follows trivially for the proof Theorem [7] 
(basically replace the 2-norm with the £i-norm). 

□ 



6.3 Analysis Using Mutual Coherence 



The RIP type of argument may be difficult to check for a given matrix and 
are more useful for claiming results for classes of matrices/linear operators. 
For instance, it has been shown that random Gaussian matrices satisfy the 
RIP with high probability. However, given one sample of a random Gaussian 
matrix, it is hard to check if it actually satisfies the RIP or not. 



Two alternative arguments are spark Chen et al. 19981 and mutual co 



herence Donoho and Elad [2003 , Candes et al. 2011b . The spark condition 
usually gives tighter bounds but is known to be difficult to compute. On 
the other hand, mutual coherence may give less tight bounds, but is more 
tractable. We will focus on mutual coherence here. 
Mutual coherence is defined as: 

Definition 12 (Mutual Coherence). For a matrix A, define the mutual 
coherence as 



^i{A) 



affa 



max 



'J I 



l<k,j<n,k=/=j \\ak\\2\\Clj\\2 



(50) 



By an abuse of notation, let B be the matrix satisfying b = BX^ with X'^ 
being the vectorized version of X. We are now ready to state the following 
theorem: 

Theorem 13 (Recovery Using Mutual Coherence). Let x he the sparsest 
solution to Q. The solution of CPRL (10), X, is equal to xx^ if it has 
rank 1 and \\X\\q < 0.5(1 + l//x(-B)). 

B{xx^), it follows from 



Proof of Theorem[T^ Since xx^ satisfies b 
iDonoho and Elad[ |2003[ Thm. 1] that 



|a;a;'^||o < 



1 



1 + 



1 



(51) 



2 V' ' KB) 

is a sufficient condition for xx^ to be a unique solution. It further follows 
that if X also satisfies (51) then we must have that X 
satisfies b = B{X). 



xx^ since X also 

□ 
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7 Experiment 



This section gives a number of comparisons with the other state-of-the- 
art methods in compressive phase retrieval. Code for the numerical illus- 
trations can be downloaded from 'http : //www.rt . isy.liu.se/~ohlssoii/ 
.code .html. 



7.1 Simulation 

First, we repeat the simulation given in Example [T] for k = 1, . . . , 5. For 
each A:, n = 64 is fixed, and we increase the measurement dimension N until 
CPRL recovered the true sparse support in at least 95 out of 100 trials, i.e., 
95% success rate. New data (x, b, and R) are generated in each trial. The 
curve of 95% success rate is shown in Figure |4] 




1 1.5 2 



Figure 4: The curves of 95% success rates for CPRL, PhaseLift, and CS. 
Note that in the CS scenario, the simulation is given the complete output y 
instead of its squared magnitudes. 

With the same simulation setup, we compare the accuracy of CPRL with 
the PhaseLift approach and the CS approach in Figure |4j First, note that 
CS is not applicable to phase retrieval problems in practice, since it assumes 
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the phase of the observation is also given. Nevertheless, the simulation shows 
CPRL via the SDP solution only requires a slightly higher sampling rate to 
achieve the same success rate as CS, even when the phase of the output is 
missing. Second, similar to the discussion in Example [T| without enforcing 
the sparsity constraint in (11), PhaseLift would fail to recover correct sparse 



signals in the low sampling rate regime. 

It is also interesting to see the performance as n and N vary and k held 
fixed. We therefore use the same setup as in Figure |4] but now fixed k = 2 
and for n = 10, . . . , 60, gradually increased until CPRL recovered the true 
sparsity pattern with 95% success rate. The same procedure is repeated to 
evaluate PhaseLift and CS. The results are shown in Figure [5| 




Figure 5: The curves of 95% success rate for CPRL, PhaseLift, and CS. 
Note that the CS simulation is given the complete output y instead of its 
squared magnitudes. 

Compared to Figure|4j we can see that the degradation from CS to CPRL 
when the phase information is omitted is largely affected by the sparsity of 
the signal. More specifically, when the sparsity k is fixed, even when the 
dimension n of the signal increases dramatically, the number of squared 
observations to achieve accurate recovery does not increase significantly for 
both CS and CPRL. 
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Next, we calculate the quantity ^ 
when 

1 



1 + 



1 



l^llo < 7^ 1 



/.(B) 
1 



as Theorem 



13 



shows that 



(52) 



and X has rank 1, then X* = X. The quantity is plotted for a number of 
different N and n's in Figure |6j From the plot it can be concluded that if the 
solution X has rank 1 and only a single nonzero component for a choice of 
2000 > n > 10, 45 > iV > 5, Theorem [l3| can guarantee that X = X* . We 
also observer that Theorem [13] is pretty conservative, since from Figure [5] 
we have that with high probability we needed > 25 to guarantee that a 
two sparse vector is recovered correctly for n = 20. 



120 
110 



100 




Figure 6: A contour plot of the quantity 2(1 + JI(b)) • ^ taken as the 
average over 10 realizations of B. 



7.2 Audio Signals 

In this section, we further demonstrate the performance of CPRL using 
signals from a real-world audio recording. The timbre of a particular note 
on an instrument is determined by the fundamental frequency, and several 
overtones. In a Fourier basis, such a signal is sparse, being the summation of 
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a few sine waves. Using the recording of a single note on an instrument will 
give us a naturally sparse signal, as opposed to synthesized sparse signals in 
the previous sections. Also, this experiment will let us analyze how robust 
our algorithm is in practical situations, where effects like room ambience 
might color our otherwise exactly sparse signal with noise. 

Our recording 2; G M'^ is a real signal, which is assumed to be sparse in 
a Fourier basis. That is, for some sparse x G C^ , we have z = Finv 

X, where 

Finv £ C^^" is a matrix representing a transform from Fourier coefficients 
into the time domain. Then, we have a randomly generated mixing matrix 
with normalized rows, R G M^^**, with which our measurements are sampled 
in the time domain: 

y = Rz = RFinvX. (53) 

Finally, we are only given the magnitudes of our measurements, such that 
b = |yp = \Rz\'^. 

For our experiment, we choose a signal with s = 32 samples, = 30 
measurements, and it is represented with n = 2s (overcomplete) Fourier 
coefficients. Also, to generate Finv, the C"^" matrix representing the Fourier 
transform is generated, and s rows from this matrix are randomly chosen. 

The experiment uses part of an audio file recording the sound of a tenor 
saxophone. The signal is cropped so that the signal only consists of a single 
sustained note, without silence. Using CPRL to recover the original audio 
signal given b, i?, and Finv, the algorithm gives us a sparse estimate £c, which 
allows us to calculate Zf>st = FinvX. We observe that all the elements of z^st 
have phases that are vr apart, allowing for one global rotation to make z^st 
purely real. This matches our previous statements that CPRL will allow us 
to retrieve the signal up to a global phase. 

We also find that the algorithm is able to achieve results that capture 
the trend of the signal using less than s measurements. In order to fully 
exploit the benefits of CPRL that allow us to achieve more precise estimates 
with smaller errors using fewer measurements relative to s, the problem 
should be formulated in a much higher ambient dimension. However, using 
the CVX Matlab toolbox by Grant and Boyd [2010 , we already ran into 



computational and memory limitations with the current implementation of 
the CPRL algorithm. These results highlight the need for a more efficient 
numerical implementation of CPRL as an SDP problem. 
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Figure 7: The retrieved signal z^st using CPRL versus the original audio 
signal z. 

8 Conclusion and Discussion 



A novel method for the compressive phase retrieval problem has been pre- 
sented. The method takes the form of an SDP problem and provides the 
means to use compressive sensing in applications where only squared mag- 
nitude measurements are available. The convex formulation gives it an edge 
over previous presented approaches and numerical illustrations show state 
of the art performance. 

One of the future directions is improving the speed of the standard SDP 
solver, i.e., interior-point methods, currently used for the CPRL algorithm. 
The authors have previously introduced efficient numerical acceleration tech- 
niques for £i-min [Yang et al. 2010 and Sparse PCA Naikal et al. , 2011 



problems. We believe similar techniques also apply to CPR. Such accelerated 
CPR solvers would facilitate exploring a broad range of high-dimensional 
CPR applications in optics, medical imaging, and computer vision, just to 
name a few. 
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Figure 8: The magnitude of x retrieved using CPRL. The audio signal Zest 
is obtained by Zgst = FinvX. 
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